hashing interface #31

mxhbl · 2025-09-01T09:49:23Z

This PR adds a more robust interface for the ghash function, allowing the user to manually select a hashing algorithm. This is necessary because the Julia Base.hash is not resistant enough to collisions -- I recently ran into a collision while working with a set of about 80000 graphs (see also the discussion in #27 and the link therein).

Main changes:

It is now possible to choose from five different hashing algorithms:
- xxHash. Both in 64 or 128 bits. Fast and reasonably secure. This is now the default.
- The Julia SHA library. Both in 64 or 128 bits. This was already used for large graphs, but can now be chosen through the interface. It's pretty slow, but the most secure.
- The Julia Base.hash is still available, but not recommended.
To select the hashing algorithm, there is a simple struct-based interface. For example, to use xxHash with 64 bits, use ghash(g; alg=XXHash64Alg()). Algorithm choice is explained in the docstring to ghash.
Since the hash values depend on the chosen algorithm, graph hashes are now cached together with the algorithm that was used to compute them. For this there is a simple HashCache type that holds 64bit and 128bit hashes together with the algorithms. For now this is only internal, but something like this could also be exported as a convenience/safety layer that checks if hash algorithms are matching before comparing hashes.
Hashes of identical graphs that are represented with different bitpacking types are now not the same anymore. This could be changed, but computing hashes in a bitpacking-agnostic way is more expensive. Since changing the bitpacking is not really public interface, and since the bitpacking type should be UInt64 in 99.9% of cases anyway, I think this is not a problem. It may become a problem in the future though, if we want to compare hashes between a DenseNautyGraph and a SparseNautyGraph.

Notes:

Even though using xxHash requires a few more allocations, it is generally faster than the Base Julia hash.
There is a Julia wrapper for the xxHash library, but it is quite a heavy dependency (its depends on CBinding.jl which takes quite long to precompile on my machine.) Since I only need a tiny subset of the functionality of xxHash, I am depending on xxHash_jll directly and call the C interface myself.
In the new implementation of ghash I am not using multiple dispatch to select the hash algorithm, but instead I use type checks on the algorithm structs, which should compile away. @Krastanov: I guess this should be equivalent, but do you think it is better/safer to use multiple dispatch?

mxhbl · 2025-09-11T08:10:26Z

Thinking about this a bit more, the tiny performance gains from caching hashes is not worth it. This last change removes hash caching and simply stores an iscanon flag for every graph that records if the graph is in canonical form or not. If it is canonical, we do not need to call nauty before hashing or isomorphism checking. The user can also query this via the new iscanon(::AbstractNautyGraph) function.

mxhbl and others added 3 commits September 1, 2025 11:23

hashing interface

c6166a1

Merge branch 'main' into hashing

83c7a22

remove hashcache, add iscanon

7e462b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hashing interface #31

hashing interface #31

Uh oh!

mxhbl commented Sep 1, 2025 •

edited

Loading

Uh oh!

mxhbl commented Sep 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

hashing interface #31

Are you sure you want to change the base?

hashing interface #31

Uh oh!

Conversation

mxhbl commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxhbl commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mxhbl commented Sep 1, 2025 •

edited

Loading

mxhbl commented Sep 11, 2025 •

edited

Loading